How Statistical Information from the Web can Help Identify Named Entities

نویسنده

Mathieu Roche

چکیده

This paper presents a Natural Language Processing (NLP) approach to filter Named Entities (NE) from a list of collocation candidates. The NE are defined as the names of ’People’, ’Places’, ’Organizations’, ’Software’, ’Illnesses’, and so forth. The proposed method is based on statistical measures associated with Web resources to identify NE. Our method has three stages: (1) Building artificial prepositional collocations from Noun-Noun candidates; (2) Measuring the ”relevance” of the resulting prepositional collocations using statistical methods (Web Mining); (3) Selecting prepositional collocations. The evaluation of Noun-Noun collocations from French and English corpora confirmed the relevance of our system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Using Wikipedia's Category Structure for Entity Search

In this paper we investigate how the category structure of Wikipedia can be exploited for Entity Ranking. In the last decade, the Web has not only grown in size, but also changed its character, due to collaborative content creation and an increasing amount of structure. Current Search Engines find Web pages rather than information or knowledge, and leave it to the searchers to locate the sought...

متن کامل

Extracting Semantic Networks among Named Entities from Websites

To enable machine processing of webpages, it is important to identify the relationships among named entities. Named entities, like, people, organizations, and places are important pieces of information that must be extracted. The scale of the web indicates that manual extraction is not feasible. We propose a system that automatically constructs a semantic network of named entities from webpages...

متن کامل

High Performance Clustering for Web Person Name Disambiguation Using Topic Capturing

Searching for named entities is a common task on the web. Among different named entities, person names are among the most frequently searched terms. However, many people can share the same name and the current search engines are not designed to identify a specific entity, or a namesake. One possible solution is to identify a namesake through clustering webpages for different namesakes. In this ...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

How Statistical Information from the Web can Help Identify Named Entities

نویسنده

چکیده

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Using Wikipedia's Category Structure for Entity Search

Extracting Semantic Networks among Named Entities from Websites

High Performance Clustering for Web Person Name Disambiguation Using Topic Capturing

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

عنوان ژورنال:

اشتراک گذاری